Conversation
|
Important Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an example for training SWE-Agent using Miles and Nemo-Gym. The changes are comprehensive, including new documentation, data processing scripts, core integration logic, and run scripts. My review focuses on improving the clarity and correctness of the documentation, enhancing the robustness of the data processing script, and refining the code quality of the integration logic and execution scripts. I've identified a critical issue in the README that would prevent users from running the example, and provided suggestions for better logging, code style, and removing redundancies.
| from miles.utils.types import Sample | ||
| from miles.rollout.sglang_rollout import GenerateState, eval_rollout | ||
| from miles.rollout.filter_hub.base_types import DynamicFilterOutput | ||
| from miles.utils.misc import load_function |
| response = await post(f"{gym_url}/run", request) | ||
|
|
||
| exit_status = response.get("info", {}).get("exit_status", "") | ||
| print(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}") |
There was a problem hiding this comment.
It's better to use the configured logger (logger.info or logger.debug) instead of print() for outputting status information. This provides more control over log levels and formatting, which is especially useful in complex systems.
| print(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}") | |
| logger.info(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}") |
|
|
||
| if exit_status == "Submitted": | ||
| sample.status = Sample.Status.COMPLETED | ||
| elif exit_status == "RolloutTruncated" or exit_status == "LimitsExceeded" or exit_status == "CollapseContinued": |
There was a problem hiding this comment.
For better readability, you can use the in operator to check if exit_status is one of several values, instead of using a chain of or conditions.
| elif exit_status == "RolloutTruncated" or exit_status == "LimitsExceeded" or exit_status == "CollapseContinued": | |
| elif exit_status in ("RolloutTruncated", "LimitsExceeded", "CollapseContinued"): |
| metrics["agent/total_time_max"] = max(values) | ||
| metrics["agent/total_time_min"] = min(values) | ||
|
|
||
| print(f"agent metrics: {metrics}") |
| sleep 3 | ||
| pkill -9 ray | ||
| pkill -9 python |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
Thanks. I will take a review on this. @yueming-yuan |
|
I'm able to reproduce this example and you may find the training log here |
| def build_tokens_and_mask_from_messages( | ||
| messages: list[dict], | ||
| tokenizer, | ||
| ) -> tuple[list[int], list[int], str, int]: | ||
|
|
||
| if not messages or len(messages) < 2: | ||
| return [], [], "", 0 | ||
|
|
||
| all_tokens = [] | ||
| loss_mask = [] | ||
| response_text = "" | ||
| prompt_length = 0 | ||
|
|
||
| for i, msg in enumerate(messages): | ||
| content = msg.get("content", "") | ||
| if not content: | ||
| continue | ||
|
|
||
| msg_tokens = tokenizer(content, add_special_tokens=False)["input_ids"] | ||
| all_tokens.extend(msg_tokens) | ||
|
|
||
| if i < 2: | ||
| prompt_length += len(msg_tokens) | ||
| else: | ||
| response_text += content | ||
| if msg["role"] == "assistant": | ||
| loss_mask.extend([1] * len(msg_tokens)) | ||
| else: | ||
| loss_mask.extend([0] * len(msg_tokens)) | ||
|
|
||
| response_length = len(all_tokens) - prompt_length | ||
|
|
||
| return all_tokens, loss_mask, response_text, response_length |
There was a problem hiding this comment.
The code performs frequent list extensions and manual mask handling within a Python loop. While currently on CPU, this becomes a GIL contention point.
| if i < 2: | ||
| prompt_length += len(msg_tokens) | ||
| else: | ||
| response_text += content |
There was a problem hiding this comment.
I replaced the string concatenation (+=) inside the loop with list.append() and "".join().
strings are immutable, so repeated concatenation creates an O(N²) memory copy overhead.
| if i < 2: | ||
| prompt_length += len(msg_tokens) | ||
| else: |
There was a problem hiding this comment.
I decoupled the loop into prompt_msgs and response_msgs using slicing.
This removes the repeated if i < 2 branch check from the iteration loop, which is better for CPU branch prediction and makes the logic for "Prompt vs. Response" explicitly clear.
zhaochenyang20
left a comment
There was a problem hiding this comment.
my comments is fixed by myseldf
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: zijiexia <zijie_xia@icloud.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Add SWE-Agent Training Example with Nemo-Gym Integration
Overview
This PR adds an example for training SWE-Agent using Miles with NVIDIA's Nemo-Gym environment, SWE-Gym dataset, and SWE-bench evaluation.
Changes Summary
examples/swe-agent/*Note: the implementation of this example is partially here, and partially in the above two submodules. Check the codes there for more details.
Key Components
1. Core Integration Module (
generate_with_swe_agent.py)/runendpoint to execute agent trajectories2. Data Processing Utility (
download_and_process_data.py)subsetandsplitfields for Gym API compatibility3. Training Scripts
run-qwen3-4b-instruct.sh: Production training script for Qwen3-4B-Instruct model4. Documentation (
README.md)5. Submodules (
.gitmodules)nemo-gym: The Gym environment provider, launching the environment/agent server. Fork with Miles-specific adaptations, added metrics computation, rollout status handling, sampling params handling, etc.
miles-swe-agentmini-swe-agent: Lightweight SWE-agent implementation. Fork with Miles-specific adaptations, added metrics computation, rollout status handling, sampling params handling, etc.
miles-swe-agentTechnical Details
Integration Architecture
swe-net)Sample Status Handling
COMPLETED: Agent successfully submitted a solution (exit_status == "Submitted")TRUNCATED: Rollout exceeded limits (RolloutTruncated,LimitsExceeded,CollapseContinued)ABORTED: Other failures (filtered out from training viadynamic_filter)Loss Masking Strategy